13 research outputs found

    SOFOS: Demonstrating the Challenges of Materialized View Selection on Knowledge Graphs

    Full text link
    Analytical queries over RDF data are becoming prominent as a result of the proliferation of knowledge graphs. Yet, RDF databases are not optimized to perform such queries efficiently, leading to long processing times. A well known technique to improve the performance of analytical queries is to exploit materialized views. Although popular in relational databases, view materialization for RDF and SPARQL has not yet transitioned into practice, due to the non-trivial application to the RDF graph model. Motivated by a lack of understanding of the impact of view materialization alternatives for RDF data, we demonstrate SOFOS, a system that implements and compares several cost models for view materialization. SOFOS is, to the best of our knowledge, the first attempt to adapt cost models, initially studied in relational data, to the generic RDF setting, and to propose new ones, analyzing their pitfalls and merits. SOFOS takes an RDF dataset and an analytical query for some facet in the data, and compares and evaluates alternative cost models, displaying statistics and insights about time, memory consumption, and query characteristics

    RDF Digest: Ontology Exploration Using Summaries

    Get PDF
    Abstract. Ontology summarization aspires to produce an abridged version of the original ontology that highlights its most representative concepts. In this paper, we present RDF Digest, a novel platform that automatically produces and visualizes summaries of RDF/S Knowledge Bases (KBs). A summary is a valid RDFS document/graph that includes the most representative concepts of the schema, adapted to the corresponding instances. To construct this graph our algorithm exploits the semantics and the structure of the schema and the distribution of the corresponding data/instances. A novel feature of our platform is that it allows summary exploration through extensible summaries. The aim of this demonstration is to dive in the exploration of the sources using summaries and to enhance the understanding of the various algorithms used. Introduction Given the explosive growth in both data size and schema complexity, data sources are becoming increasingly difficult to understand and use. Ontologies often have extremely complex schemas which are difficult to comprehend, limiting the exploration and the exploitation potential of the information they contain. Besides schema, the large amount of data in those sources increase the effort required for exploring them. Over the latest years, various techniques have been provided on constructing overviews on ontologies [1-4], maintaining however the more important ontology elements. These overviews are provided by means of an ontology summary. Ontology summarization [4] is defined as the process of distilling knowledge from an ontology in order to produce an abridged version. While summaries are useful, creating a "good" summary is a non-trivial task. A summary should be concise, yet it needs to convey enough information in order to enable a decent understanding of the original schema. Moreover, the summarization should be coherent and should provide an extensive coverage of the entire ontology. So far, although a reasonable number of research works tried to address the problem of summarization from different angles, a solution that simultaneously exploits the semantics of the schemas and the data instances is still missing. In this demonstration, we focus on RDF/S KBs and demonstrate for the first time the implementation of the algorithms introduced i

    Coverage-Based Summaries for RDF KBs

    Get PDF
    As more and more data become available as linked data, the need for efficient and effective methods for their exploration becomes apparent. Semantic summaries try to extract meaning from data, while reducing its size. State of the art structural semantic summaries, focus primarily on the graph structure of the data, trying to maximize the summary’s utility for query answering, i.e. the query coverage. In this poster paper, we present an algorithm, trying to maximize the aforementioned query coverage, using ideas borrowed from result diversification. The key idea of our algorithm is that, instead of focusing only to the “central” nodes, to push node selection also to the perimeter of the graph. Our experiments show the potential of our algorithm and demonstrate the considerable advantages gained for answering larger fragments of user queries.acceptedVersionPeer reviewe

    SumMER: Structural Summarization for RDF/S KGs

    No full text
    Knowledge graphs are becoming more and more prevalent on the web, ranging from small taxonomies, to large knowledge bases containing a vast amount of information. To construct such knowledge graphs either automatically or manually, tools are necessary for their quick exploration and understanding. Semantic summaries have been proposed as a key technology enabling the quick understanding and exploration of large knowledge graphs. Among the methods proposed for generating summaries, structural methods exploit primarily the structure of the graph in order to generate the result summaries. Approaches in the area focus on identifying the most important nodes and usually employ a single centrality measure, capturing a specific perspective on the notion of a node’s importance. Moving from one centrality measure to many however, has the potential to generate a more objective view on nodes’ importance, leading to better summaries. In this paper, we present SumMER, the first structural summarization technique exploiting machine learning techniques for RDF/S KGs. SumMER explores eight centrality measures and then exploits machine learning techniques for optimally selecting the most important nodes. Then those nodes are linked formulating a subgraph out of the original graph. We experimentally show that combining centrality measures with machine learning effectively increases the quality of the generated summaries

    HInT: Hybrid and Incremental Type Discovery for Large RDF Data Sources

    No full text
    International audienceThe rapid explosion of linked data has resulted into many weakly structured and incomplete data sources, where typing information might be missing. On the other hand, type information is essential for a number of tasks such as query answering, integration, summarization and partitioning. Existing approaches for type discovery, either completely ignore type declarations available in the dataset (implicit type discovery approaches), or rely only on existing types, in order to complement them (explicit type enrichment approaches). Implicit type discovery approaches are based on instance grouping, which requires an exhaustive comparison between the instances. This process is expensive and not incremental. Explicit type enrichment approaches on the other hand, are not able to identify new types and they can not process data sources that have little or no schema information. In this paper, we present HInT, the first incremental and hybrid type discovery system for RDF datasets, enabling type discovery in datasets where type declarations are missing. To achieve this goal, we incrementally identify the patterns of the various instances, we index and then group them to identify the types. During the processing of an instance, our approach exploits its type information, if available, to improve the quality of the discovered types by guiding the classification of the new instance in the correct group and by refining the groups already built. We analytically and experimentally show that our approach dominates in terms of efficiency, competitors from both worlds, implicit type discovery and explicit type enrichment while outperforming them in most of the cases in terms of quality
    corecore